Efficient Behavior Learning Based on State Value Estimation of Self and Others

نویسندگان

Yasutake Takahashi

Kentarou Noma

Minoru Asada

چکیده

The existing reinforcement learning methods have been seriously suffering from the curse of dimension problem especially when they are applied to multiagent dynamic environments. One of the typical examples is a case of RoboCup competitions since other agents and their behavior easily cause state and action space explosion. This paper presents a method of modular learning in a multiagent environment by which the learning agent can acquire cooperative behavior with its teammates and competitive ones against its opponents. The key ideas to resolve the issue are as follows. First, a two-layer hierarchical system with multi learning modules is adopted to reduce the size of the sensor and action spaces. The state space of the top layer consists of the state values from the lower level, and the macro actions are used to reduce the size of the physical action space. Second, the state of the other, to what extent it is close to its own goal, is estimated by observation and used as a state variable in the top layer state space to realize the cooperative/competitive behavior. The method is applied to 4 (defense team) on 5 (offense team) game task, and the learning agent (a passer of the offense team) successfully acquired the teamwork plays (pass and shoot) within much shorter learning time. keywords: reinforcement learning, cooperative/competitive behavior acquisition, multi-agent system, modular learning system, RoboCup

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Model Based Method for Determining the Minimum Embedding Dimension from Solar Activity Chaotic Time Series

Predicting future behavior of chaotic time series system is a challenging area in the literature of nonlinear systems. The prediction's accuracy of chaotic time series is extremely dependent on the model and the learning algorithm. On the other hand the cyclic solar activity as one of the natural chaotic systems has significant effects on earth, climate, satellites and space missions. Several m...

متن کامل

Change Point Estimation of the Stationary State in Auto Regressive Moving Average Models, Using Maximum Likelihood Estimation and Singular Value Decomposition-based Filtering

In this paper, for the first time, the subject of change point estimation has been utilized in the stationary state of auto regressive moving average (ARMA) (1, 1). In the monitoring phase, in case the features of the question pursue a time series, i.e., ARMA(1,1), on the basis of the maximum likelihood technique, an approach will be developed for the estimation of the stationary state’s change...

متن کامل

Spiral Development of Behavior Acquisition and Recognition Based on State Value

Both self-learning architecture (embedded structure) and explicit/implicit teaching from other agents (environmental design issue) are necessary not only for one behavior learning but more seriously for life-time behavior learning. This paper presents a method for a robot to understand unfamiliar behavior shown by surrounding players through the collaboration between behavior acquisition and re...

متن کامل

An Improved Particle Swarm Optimizer Based on a Novel Class of Fast and Efficient Learning Factors Strategies

The particle swarm optimizer (PSO) is a population-based metaheuristic optimization method that can be applied to a wide range of problems but it has the drawbacks like it easily falls into local optima and suffers from slow convergence in the later stages. In order to solve these problems, improved PSO (IPSO) variants, have been proposed. To bring about a balance between the exploration and ex...

متن کامل

Educational approaches to social-emotional learning in schools

The School-based interventions seek to identify and administrate the affecting changes on important factors of classroom processes that predict student’s academic and social-emotional development. One of the affecting factors contributing to the improvement of the classroom quality and interactions is social-emotional learning. Social-emotional learning processes for integrating of thought, fee...

متن کامل

ذخیره در منابع من

ذخیره در منابع من قبلا به منابع من ذحیره شده

{@ msg_add @}

با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

Advanced Robotics

دوره 22 شماره

صفحات -

تاریخ انتشار 2008

Efficient Behavior Learning Based on State Value Estimation of Self and Others

نویسندگان

چکیده

منابع مشابه

Model Based Method for Determining the Minimum Embedding Dimension from Solar Activity Chaotic Time Series

Change Point Estimation of the Stationary State in Auto Regressive Moving Average Models, Using Maximum Likelihood Estimation and Singular Value Decomposition-based Filtering

Spiral Development of Behavior Acquisition and Recognition Based on State Value

An Improved Particle Swarm Optimizer Based on a Novel Class of Fast and Efficient Learning Factors Strategies

Educational approaches to social-emotional learning in schools

عنوان ژورنال:

اشتراک گذاری